1 Goal

Today my goal is to write a python program that selects books that I haven’t read from my Goodreads list, at random with filtering options. It should be possible to choose publishing date, max amount of pages, min rating, date added and maybe others.

import pandas as pd

df = pd.read_csv('data/day24/goodreads_library_export.csv')

df.head(5)

	Book Id	Title	Author	Author l-f	Additional Authors	ISBN	ISBN13	Average Rating	Publisher	...	Date Read	Date Added	Bookshelves	Bookshelves with positions	Exclusive Shelf	My Review	Spoiler	Private Notes	Read Count
0	51648276	Drive Your Plow Over the Bones of the Dead	Olga Tokarczuk	Tokarczuk, Olga	Antonia Lloyd-Jones, Beata Poźniak	=""	=""	3.94	Penguin Audio	...	NaN	2023/11/08	NaN	NaN	read	NaN	NaN	NaN	1
1	18112493	Parissyndromet	Heidi Furre	Furre, Heidi	NaN	="8282880035"	="9788282880039"	4.12	Flamme	...	NaN	2024/12/21	NaN	NaN	read	NaN	NaN	NaN	1
2	25489025	The Vegetarian	Han Kang	Kang, Han	Deborah Smith	="0553448188"	="9780553448184"	3.64	Hogarth	...	NaN	2024/12/21	NaN	NaN	read	NaN	NaN	NaN	1
3	28921	The Remains of the Day	Kazuo Ishiguro	Ishiguro, Kazuo	NaN	=""	=""	4.14	Faber & Faber	...	NaN	2025/07/15	NaN	NaN	read	NaN	NaN	NaN	1
4	43868109	Empire of Pain: The Secret History of the Sack...	Patrick Radden Keefe	Keefe, Patrick Radden	NaN	="0385545681"	="9780385545686"	4.54	Doubleday	...	NaN	2025/07/10	to-read	to-read (#298)	to-read	NaN	NaN	NaN	0

5 rows × 24 columns

2 Data Cleaning

First, I need to clean the data a little and remove unwanted columns and rows

# Remove read books
to_read = df[df['Read Count'] == 0]

to_read

	Book Id	Title	Author	Author l-f	Additional Authors	ISBN	ISBN13	My Rating	Average Rating	Publisher	...	Date Read	Date Added	Bookshelves	Bookshelves with positions	Exclusive Shelf	My Review	Spoiler	Private Notes	Read Count	Owned Copies
4	43868109	Empire of Pain: The Secret History of the Sack...	Patrick Radden Keefe	Keefe, Patrick Radden	NaN	="0385545681"	="9780385545686"	0	4.54	Doubleday	...	NaN	2025/07/10	to-read	to-read (#298)	to-read	NaN	NaN	NaN	0	0
5	40163119	Say Nothing: A True Story of Murder and Memory...	Patrick Radden Keefe	Keefe, Patrick Radden	NaN	="0385521316"	="9780385521314"	0	4.47	Doubleday	...	NaN	2025/07/10	to-read	to-read (#297)	to-read	NaN	NaN	NaN	0	0
6	42683	On Writing	Ernest Hemingway	Hemingway, Ernest	Larry W. Phillips, Charles Scribner Jr.	="0684854295"	="9780684854298"	0	4.02	Scribner	...	NaN	2025/06/12	to-read	to-read (#296)	to-read	NaN	NaN	NaN	0	0
7	22816087	Seveneves	Neal Stephenson	Stephenson, Neal	NaN	=""	=""	0	4.00	William Morrow	...	NaN	2025/06/11	to-read	to-read (#295)	to-read	NaN	NaN	NaN	0	0
8	50365	A Suitable Boy (A Bridge of Leaves, #1)	Vikram Seth	Seth, Vikram	NaN	="0060786523"	="9780060786526"	0	4.11	Harper Perennial Modern Classics	...	NaN	2025/06/11	to-read	to-read (#294)	to-read	NaN	NaN	NaN	0	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
391	28815	Influence: The Psychology of Persuasion	Robert B. Cialdini	Cialdini, Robert B.	NaN	="006124189X"	="9780061241895"	0	4.22	Harper Business	...	NaN	2018/08/27	to-read	to-read (#5)	to-read	NaN	NaN	NaN	0	0
394	2255	Way of the Peaceful Warrior: A Book That Chang...	Dan Millman	Millman, Dan	NaN	="1932073205"	="9781932073201"	0	4.13	HJ Kramer	...	NaN	2018/08/27	to-read	to-read (#4)	to-read	NaN	NaN	NaN	0	0
396	19795	Power vs. Force: The Hidden Determinants of Hu...	David R. Hawkins	Hawkins, David R.	NaN	="1561709336"	="9781561709335"	0	4.15	Hay House	...	NaN	2018/08/21	to-read	to-read (#3)	to-read	NaN	NaN	NaN	0	0
404	566259	Fire in the Belly: On Being a Man	Sam Keen	Keen, Sam	NaN	="0553351370"	="9780553351378"	0	3.81	Bantam	...	NaN	2018/08/21	to-read	to-read (#2)	to-read	NaN	NaN	NaN	0	0
405	1052	The Richest Man in Babylon	George S. Clason	Clason, George S.	NaN	="0451205367"	="9780451205360"	0	4.23	Berkley Books	...	NaN	2018/08/21	to-read	to-read (#1)	to-read	NaN	NaN	NaN	0	0

308 rows × 24 columns

df.columns

Index(['Book Id', 'Title', 'Author', 'Author l-f', 'Additional Authors',
       'ISBN', 'ISBN13', 'My Rating', 'Average Rating', 'Publisher', 'Binding',
       'Number of Pages', 'Year Published', 'Original Publication Year',
       'Date Read', 'Date Added', 'Bookshelves', 'Bookshelves with positions',
       'Exclusive Shelf', 'My Review', 'Spoiler', 'Private Notes',
       'Read Count', 'Owned Copies'],
      dtype='object')

# Columns that I want to keep
columns = ['Title', 'Author', 'Average Rating', 'Publisher',
       'Number of Pages', 'Original Publication Year', 'Date Added']

to_read = to_read[columns]

to_read.head(5)

	Title	Author	Average Rating	Publisher	Number of Pages	Original Publication Year	Date Added
4	Empire of Pain: The Secret History of the Sack...	Patrick Radden Keefe	4.54	Doubleday	535.0	2021.0	2025/07/10
5	Say Nothing: A True Story of Murder and Memory...	Patrick Radden Keefe	4.47	Doubleday	441.0	2018.0	2025/07/10
6	On Writing	Ernest Hemingway	4.02	Scribner	160.0	1984.0	2025/06/12
7	Seveneves	Neal Stephenson	4.00	William Morrow	872.0	2015.0	2025/06/11
8	A Suitable Boy (A Bridge of Leaves, #1)	Vikram Seth	4.11	Harper Perennial Modern Classics	1474.0	1993.0	2025/06/11

# Remove NaN values
to_read = to_read.dropna()

to_read.head(5)

	Title	Author	Average Rating	Publisher	Number of Pages	Original Publication Year	Date Added
4	Empire of Pain: The Secret History of the Sack...	Patrick Radden Keefe	4.54	Doubleday	535.0	2021.0	2025/07/10
5	Say Nothing: A True Story of Murder and Memory...	Patrick Radden Keefe	4.47	Doubleday	441.0	2018.0	2025/07/10
6	On Writing	Ernest Hemingway	4.02	Scribner	160.0	1984.0	2025/06/12
7	Seveneves	Neal Stephenson	4.00	William Morrow	872.0	2015.0	2025/06/11
8	A Suitable Boy (A Bridge of Leaves, #1)	Vikram Seth	4.11	Harper Perennial Modern Classics	1474.0	1993.0	2025/06/11

I notice that some of the columns are type float, I want them to be integers instead

to_read.dtypes

Title                         object
Author                        object
Average Rating               float64
Publisher                     object
Number of Pages              float64
Original Publication Year    float64
Date Added                    object
dtype: object

to_read = to_read.astype({'Number of Pages': int, 'Original Publication Year': int})
to_read['Date Added'] = pd.to_datetime(to_read['Date Added'])

3 Creating random book picker function

import datetime
import random

def random_book(df, options: int = 1, title: str = None, author: str = None, min_rating: float = 0, publisher: str = None, min_year: int = None, max_year: int = None, added_year: int = None, added_month: int = None): 
    if title is not None:
        df = df.loc[df['Title'].str.contains(title, case=False)]
        
    if author is not None:
        df = df.loc[df['Author'].str.contains(author, case=False)]
        if df.empty == True:
            print("You haven't saved any books that you want to read by that author")
            return
        
    if min_rating is not None and min_rating >= df['Average Rating'].min():
        df = df.loc[df['Average Rating'] >= min_rating]
        
    if publisher is not None:
        df = df.loc[df['Publisher'].str.contains(publisher)]
        
    if min_year is not None:
        if min_year < df['Original Publication Year'].min():
            min_year = df['Original Publication Year'].min()
        df = df.loc[df['Original Publication Year'] >= min_year]
        
    if max_year is not None:
        if max_year > df['Original Publication Year'].max():
            max_year = df['Original Publication Year'].max()
        df = df.loc[df['Original Publication Year'] <= max_year]
        
    if added_year is not None and (added_year < df['Date Added'].dt.year.min() or added_year > df['Date Added'].dt.year.max()):
        df = df.loc[df['Date Added'].dt.year == added_year]
        
    if added_month is not None:
        if (added_month > 12 or added_month < 1):
            print('Month out of range, choose a number between 1 and 12')
            return
        df = df.loc[df['Date Added'].dt.month == added_month]

    # Pick a book for the number of choices wanted
    books = []
    for i in range(options):
        books.append(random.randint(0, len(df)-1))
        
    return df.iloc[books]

4 Testing

random_book(to_read, added_month=2030)

Month out of range, choose a number between 1 and 12

random_book(to_read, title='japan')

	Title	Author	Average Rating	Publisher	Number of Pages	Original Publication Year	Date Added
50	Bushido: The Soul of Japan	Inazō Nitobe	3.84	Kodansha USA	160	1899	2024-04-21

random_book(to_read, min_year=1800, max_year=1940)

	Title	Author	Average Rating	Publisher	Number of Pages	Original Publication Year	Date Added
374	The Brothers Karamazov	Fyodor Dostoevsky	4.39	Farrar, Straus and Giroux	796	1880	2018-11-10

random_book(to_read, min_rating=4.1)

	Title	Author	Average Rating	Publisher	Number of Pages	Original Publication Year	Date Added
64	My Traitor's Heart: A South African Exile Retu...	Rian Malan	4.25	Grove Press	349	1990	2023-04-05

random_book(to_read, author='Murakami')

You haven't saved any books that you want to read by that author

random_book(to_read, options=3)

	Title	Author	Average Rating	Publisher	Number of Pages	Original Publication Year	Date Added
32	Utz	Bruce Chatwin	3.67	Penguin Publishing Group	154	1988	2024-12-29
321	Nine Chains to the Moon	R. Buckminster Fuller	3.85	Southern Illinois University Press	384	1963	2020-10-19
179	Swann’s Way (In Search of Lost Time, #1)	Marcel Proust	4.16	Penguin Classics	468	1913	2021-09-10

5 Conclusion

There we have it, a simple random book picker.

It however isn’t optimized for speed as I repeatedly re-assign the DataFrame instead of saving all the filters and then using the saved filtered in one filter operation for the dataframe.

Also the amount of parameters is high for the function, could be an option to use *arg and **kwargs instead.

Would additionally have been better if there was an API for one’s own Goodreads library, then I wouldn’t have to download a csv file when new books are added. This was however just a for-fun coding task.

Also I’m lacking a ‘genre’ column, which would be nice to use to filter books by.